KANDA DATA

  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Menu
  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Home/Assumptions of Linear Regression/Differences in Assumptions of Normality, Heteroscedasticity, and Multicollinearity in Linear Regression Analysis

Blog

2,469 views

Differences in Assumptions of Normality, Heteroscedasticity, and Multicollinearity in Linear Regression Analysis

By Kanda Data / Date Dec 30.2024 / Category Assumptions of Linear Regression

If you analyze research data using linear regression, it is crucial to understand the required assumptions. Understanding these assumption tests is essential to ensure consistent and unbiased analysis results.

If you have studied econometrics, one of your lecturers might have emphasized the importance of fulfilling assumptions in linear regression using the OLS (Ordinary Least Squares) method. It is necessary to ensure and conduct a series of assumption tests to achieve the Best Linear Unbiased Estimator (BLUE).

Because of the importance of understanding these assumption tests, I am interested in writing an article discussing the core assumption tests in OLS linear regression. Specifically, I will discuss the differences in the assumptions of Normality, Heteroscedasticity, and Multicollinearity.

Normality Assumption Test in Regression

The first assumption test I will discuss is the normality test. In linear regression analysis, the normality test is a critical stage. Make sure you don’t skip it.

One key point to understand is that the normality test differs from other assumption tests in analysis. In linear regression, the normality test examines the residuals.

Again, I emphasize: in linear regression analysis, the normality test focuses on the residuals. Why?

According to theoretical books, one of the assumptions in linear regression requires the residuals to be normally distributed. Residuals that are not normally distributed fail to meet the assumption because they may lead to bias.

Another important point to understand is the definition of residuals. Residuals are the differences between the actual observed values and the predicted values.

In formulaic terms, residuals are the difference between the actual (Y) and the predicted Y. Residual values can be calculated manually or obtained using statistical software.

If you calculate residuals manually, you need to first calculate the intercept and regression estimation coefficients. These values are required to compute the predicted Y values.

Once the predicted Y values are calculated, residuals can be determined by subtracting the predicted Y values from the actual Y values. The next step is to perform a normality test on the calculated residuals.

Several tests can be used to check residual normality. The most popular tests among researchers are the Kolmogorov-Smirnov test and the Shapiro-Wilk test.

Both tests produce similar conclusions regarding residual normality. Their coefficients typically differ slightly, but the conclusion—whether to accept or reject the null hypothesis—is identical.

In the residual normality test, the null hypothesis can be stated as “residuals are normally distributed.” The alternative hypothesis (H1) is “residuals are not normally distributed.”

Using the Kolmogorov-Smirnov or Shapiro-Wilk test, a p-value is obtained as the basis for drawing conclusions. The decision criterion is as follows: if the p-value is less than or equal to 0.05, the null hypothesis is rejected.

Conversely, if the p-value is greater than 0.05, the null hypothesis is accepted. For example, if you perform a residual normality test and obtain a Kolmogorov-Smirnov p-value of 0.220, this value is greater than 0.05, so the null hypothesis is accepted.

Since the null hypothesis is accepted, it can be concluded that the residuals are normally distributed. This means that the assumption required for linear regression is satisfied in the given example.

Heteroscedasticity Assumption Test in Regression

The heteroscedasticity test is another assumption test you need to perform to obtain the Best Linear Unbiased Estimator. In this test, you must ensure that the residual variance is constant.

One assumption in OLS linear regression requires constant residual variance, also known as homoscedasticity.

Therefore, understanding how to detect heteroscedasticity in a regression equation is essential. If the heteroscedasticity detection shows that residual variance is not constant, the regression equation exhibits heteroscedasticity symptoms.

Regression equations with heteroscedasticity symptoms, or non-constant residual variance, may result in biased estimation outcomes. So, how do you detect heteroscedasticity?

Several tests can be used to detect heteroscedasticity, one of which is the Breusch-Pagan test.

Similar to the normality test, the heteroscedasticity test also requires a null hypothesis and an alternative hypothesis:

H0 = Residual variance is constant (homoscedasticity)

H1 = Residual variance is not constant (heteroscedasticity)

Suppose you perform a heteroscedasticity test using the Breusch-Pagan method and obtain a coefficient of 8.57 and a p-value of 0.159. Since the p-value is greater than 0.05, the null hypothesis is accepted.

Accepting the null hypothesis indicates constant residual variance or homoscedasticity. Hence, based on this test, the assumption required for OLS linear regression is satisfied.

Multicollinearity Assumption Test in Linear Regression

The multicollinearity test is another critical assumption test in linear regression analysis. It ensures no strong correlation exists between independent variables.

If independent variables strongly correlate, the regression equation faces multicollinearity issues, leading to biased estimation results. Therefore, conducting a multicollinearity test is vital to maintain consistent and unbiased results.

This test applies only to multiple linear regression, not simple linear regression, as its purpose is to assess correlations among independent variables.

You might wonder how to test for multicollinearity in linear regression. One way is to compute the correlations between independent variables. However, the most popular method among researchers is to use the Variance Inflation Factor (VIF).

In multiple linear regression, the VIF can be calculated manually or analyzed using statistical software.

Typically, a VIF value below 10 indicates no multicollinearity. Some sources even suggest that a VIF below 5 confirms the absence of multicollinearity. Thus, the multicollinearity test is crucial to ensure no strong correlation among independent variables.

This article has explained the differences between normality, heteroscedasticity, and multicollinearity tests in linear regression analysis. I hope it benefits those who have yet to understand these assumption tests thoroughly. Thank you for reading, and stay tuned for the next article update!

Tags: Best Linear Unbiased Estimator, Breusch-Pagan Test, econometrics, heteroscedasticity test, homoscedasticity, Kanda data, linear regression assumptions, multicollinearity test, multiple regression, normality test, OLS Regression, regression residuals, simple regression, Statistical Analysis, variance inflation factor

Related posts

How to Create a Research Location Map in Excel: District, Province, and Country Maps

Date Oct 07.2025

How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness

Date Oct 02.2025

Regression Analysis for Binary Categorical Dependent Variables

Date Sep 27.2025

Categories

  • Article Publication
  • Assumptions of Linear Regression
  • Comparison Test
  • Correlation Test
  • Data Analysis in R
  • Econometrics
  • Excel Tutorial for Statistics
  • Multiple Linear Regression
  • Nonparametric Statistics
  • Profit Analysis
  • Regression Tutorial using Excel
  • Research Methodology
  • Simple Linear Regression
  • Statistics

Popular Post

October 2025
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Sep    
  • How to Create a Research Location Map in Excel: District, Province, and Country Maps
  • How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness
  • Regression Analysis for Binary Categorical Dependent Variables
  • How to Sort Values from Highest to Lowest in Excel
  • How to Perform Descriptive Statistics in Excel in Under 1 Minute
Copyright KANDA DATA 2025. All Rights Reserved